Skip to content

HDU -> FITS#1107

Merged
rhayes777 merged 32 commits intomainfrom
feature/aggregate_hdu
Mar 14, 2025
Merged

HDU -> FITS#1107
rhayes777 merged 32 commits intomainfrom
feature/aggregate_hdu

Conversation

@rhayes777
Copy link
Copy Markdown
Collaborator

Doesn't implement HDU splicing but implements crucial fix that loads whole fits rather than just primary HDU. This may not be completely backwards compatible

@rhayes777
Copy link
Copy Markdown
Collaborator Author

This PR now also accounts includes AggregateFITS.

Usage:

aggregate_fits = af.AggregateFITS(aggregator)

# Create a FITS (note the PrimaryHDU is empty)
hdu_list = aggregate_fits.extract_fits(
    af.FitFITS.ModelImage,
    af.FitFITS.ResidualMap,
)

# Output to a folder. This will create one FITS file for each search output. Note that a 'name' argument can be passed to define which attribute of the search is used to name the file
aggregate_fits.output_to_folder(
    path_to_folder,
    af.FitFITS.ModelImage,
    af.FitFITS.ResidualMap,
)

)


class FitFITS(Enum):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to be implemented for other fits files. For example, galaxy_images.fits can be included by defining GalaxyImagesFITS

@Jammy2211
Copy link
Copy Markdown
Collaborator

I updated the example here:

https://github.com/Jammy2211/autogalaxy_workspace/blob/main/scripts/results/examples/fits_splice.py

But have two issues.

Firstly, I can run this code:

hdu_list = agg_fits.extract_fits(
    af.FitFITS.ModelImage,
    af.FitFITS.ResidualMap,
)

But output_to_fits does not use hdu_list, so it seems pointless?

aggregate_fits.output_to_folder(
    path_to_folder,
    af.FitFITS.ModelImage,
    af.FitFITS.ResidualMap,
)

I see that self_hdus is used internally, but why pass *hdus in to both funtions?

Secondly, I don't follow the name input of output_to_folder, which I tried to use below:

agg_fits.output_to_folder(
    Path("."),
    hdu_list,
    name="fits",
)

But got this error:

Aggregator loading search_outputs... could take some time.

 A total of 2 search_outputs and results were found.
Traceback (most recent call last):
  File "/mnt/c/Users/Jammy/Code/PyAuto/autogalaxy_workspace/scripts/results/examples/fits_splice.py", line 167, in <module>
    agg_fits.output_to_folder(
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/aggregator/summary/aggregate_fits.py", line 130, in output_to_folder
    ).writeto(folder / name)
  File "/home/jammy/venvs/PyAuto/lib/python3.10/site-packages/astropy/io/fits/hdu/hdulist.py", line 1032, in writeto
    fileobj = _File(fileobj, mode=mode, overwrite=overwrite)
  File "/home/jammy/venvs/PyAuto/lib/python3.10/site-packages/astropy/io/fits/file.py", line 218, in __init__
    self._open_filename(fileobj, mode, overwrite)
  File "/home/jammy/venvs/PyAuto/lib/python3.10/site-packages/astropy/io/fits/file.py", line 651, in _open_filename
    self._file = open(self.name, IO_FITS_MODES[mode])
FileNotFoundError: [Errno 2] No such file or directory: '[<FITSOutput dataset.data (output/results_folder_csv/simple_0/results/441ad79fb6081c6f9ee07d3fff9eb838/files/dataset/data.fits)>, <FITSOutput dataset.mask (output/results_folder_csv/simple_0/results/441ad79fb6081c6f9ee07d3fff9eb838/files/dataset/mask.fits)>, <FITSOutput dataset.noise_map (output/results_folder_csv/simple_0/results/441ad79fb6081c6f9ee07d3fff9eb838/files/dataset/noise_map.fits)>, <FITSOutput dataset.over_sample_size_lp (output/results_folder_csv/simple_0/results/441ad79fb6081c6f9ee07d3fff9eb838/files/dataset/over_sample_size_lp.fits)>, <FITSOutput dataset.over_sample_size_pixelization (output/results_folder_csv/simple_0/results/441ad79fb6081c6f9ee07d3fff9eb838/files/dataset/over_sample_size_pixelization.fits)>, <FITSOutput dataset.psf (output/results_folder_csv/simple_0/results/441ad79fb6081c6f9ee07d3fff9eb838/files/dataset/psf.fits)>, <FITSOutput fit (output/results_folder_csv/simple_0/results/441ad79fb6081c6f9ee07d3fff9eb838/image/fit.fits)>, <FITSOutput galaxy_images (output/results_folder_csv/simple_0/results/441ad79fb6081c6f9ee07d3fff9eb838/image/galaxy_images.fits)>, <FITSOutput model_galaxy_images (output/results_folder_csv/simple_0/results/441ad79fb6081c6f9ee07d3fff9eb838/image/model_galaxy_images.fits)>].fits'

I think the behaviour should be the following:

agg_fits.output_to_folder(
    Path("."),
    hdu_list,
    name="unique_tag",
)

So here, I am telling it to use unique_tag of each search to name each output .fits file, which in the example would be simple_0 and simple_1. This should be the default behaviour.

If unique_tag is not found, it should raise an Exception and require that the user inputs a list of names which has the same size as the number of fits files that are to be created:

agg_fits.output_to_folder(
    Path("."),
    hdu_list,
    name=[f"file_name_{i} for i in range(2)]
)

@Jammy2211
Copy link
Copy Markdown
Collaborator

The PR also required me to update PyAutoGalaxy but I didnt fix aggregator related unit tests:

PyAutoLabs/PyAutoGalaxy#220

@rhayes777
Copy link
Copy Markdown
Collaborator Author

self._hdus is a method to extract the hdus corresponding to the enum values. extract_fits takes enum args and returns a single FITS object comprising all matching HDUs from all searches. output_to_folder takes enum args and creates a FITS file for each search which includes HDUs from that search corresponding to enum args. This is exactly the same behaviour as provided by AggregateImages

@rhayes777
Copy link
Copy Markdown
Collaborator Author

I've fixed the error. Currently the output in both AggregateImages and AggregateFITS is to use the 'name'. We can change that to the unique_tag if you think that's better?

@Jammy2211
Copy link
Copy Markdown
Collaborator

I think we should make name have no default argument, so a user knows that unique_tag is being used.

We should rename the variable, as name makes me think its the name of the file but its not, it refers to the search property. You can probably think of something more explicit than this but we could go for search_attribute_for_name? Maybe something more concise....

I think we should add a method which allows a user to see the full list of names that will be used, so something like:

print(agg_fits.filename_list_from(search_attribute_for_name="unique_tag")

Maybe output_to_folder could receive filename_list as input instead of search_attribute_for_name and users can do it in two steps, allowing for more customization?

@rhayes777
Copy link
Copy Markdown
Collaborator Author

I think we should make name have no default argument, so a user knows that unique_tag is being used.

We should rename the variable, as name makes me think its the name of the file but its not, it refers to the search property. You can probably think of something more explicit than this but we could go for search_attribute_for_name? Maybe something more concise....

I think we should add a method which allows a user to see the full list of names that will be used, so something like:

print(agg_fits.filename_list_from(search_attribute_for_name="unique_tag")

Maybe output_to_folder could receive filename_list as input instead of search_attribute_for_name and users can do it in two steps, allowing for more customization?

Will we definitely a custom list? Seems like overkill? We could log what names are being used? I can only see this being something a user wants to check beforehand if the process of aggregation is going to take a considerable length of time

@Jammy2211
Copy link
Copy Markdown
Collaborator

The problem I see is if a user didn't set up unique_tag to be dataset names, it could be a pain for them to make a list that uniquely names every file in an informative way.

For example, they could use the name of the search, or some splicing of path_prefix, but its hard to really be sure?

I am happy to avoid a custom list for now if its particularly burdensome for the source code.

@rhayes777
Copy link
Copy Markdown
Collaborator Author

No it's not so bad. There's always the identifier which should be unique per search but isn't human readable

@rhayes777
Copy link
Copy Markdown
Collaborator Author

I guess another point is do they even know what order the searches are in? If not then a list doesn't make a lot of sense?

@Jammy2211
Copy link
Copy Markdown
Collaborator

I guess they'd have to use the list to work out the order and then customize it lol.

I am happy to avoid the list, I will make sure documnetation and the like means a user understands that they need to setup unique_tag carefully (or can use the human unreadable identifier as an alternative).

@Jammy2211
Copy link
Copy Markdown
Collaborator

I wouldn't have a default value though as I think forcing the user to think about this is good.

@rhayes777
Copy link
Copy Markdown
Collaborator Author

Cool will remove the default

@rhayes777 rhayes777 merged commit 1082349 into main Mar 14, 2025
0 of 4 checks passed
@rhayes777 rhayes777 deleted the feature/aggregate_hdu branch March 14, 2025 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants